Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) EncodingManager refactoring #1112

Closed
wants to merge 38 commits into from
Closed

Conversation

karussell
Copy link
Member

@karussell karussell commented Jul 13, 2017

This is a first sketch of a new approach to improve the flexibility for the edge property storage and it fixes #472, #728, #978 and probably some more issues.

For the moment just look at the nice FastestCarWeighting and how we fetch the values in the constructor that are necessary in the calcWeight method, really easy and understandable:

double speed = edgeState.get(averageSpeed);

These values are composed upfront and are responsible just for a value (and optionally a reverse value), have a look into OSMReaderTest.testEncodedValueBasedEncodingManager for a complete test parsing an OSM file.

So all in all I was able to remove the necessity of the FlagEncoder classes. So just EncodingManager and a list of EncodedValue is necessary. And the best: this already works within the existing system and the new EncodingManager is compatible with the old.

I've created a EncodedValueFactory to create some default EncodedValues like maxspeed, surface, 'car access' etc so that useful Weightings can be created.

The main changes in this PR:

  1. long flags was replaced with a IntsRef that is not limited to 64bits (later on we could use this for all edge properties like distance and pointers, also there is a minor advantage that we could point directly into the storage backing array instead of copying but this is for a later PR)
  2. new directed EncodedValue that are able to store forward and backward values, no need for expensive reverseFlags method, much simpler usage in Weighting implementations, no need for flag encoders but still backward compatible to them

main goals of this PR:

  • edge storage should allow more than 64bits per edge
  • easier storing of edge properties -> EncodedValue methods
  • EncodingManager will be EncodedValues based
  • avoid reverseFlags

non-goals:

  • remove FlagEncoders
  • fix PathExtract
  • flexible storage for relation and node
  • solve problem with two bits for access restriction versus speed==0

Further work items:

  • rename MappedXY to more common DictionaryXY?
  • use new EncodedValue classes in old FlagEncoder classes: DataFlagEncoder, CarFlagEncoder, ...
  • merge master into this branch
  • what to do with the unfavoredEdge used only in QueryGraph -> do we need a 0-bit EncodedValue used just as a 'marker'?
  • use two EncodedValues (weight and access) for shortcuts in CHGraphImpl -> advantage would be that we would be able to store weight with higher precision
  • version of every EncodedValue to make loading graph more secure?
  • avoid Weighting.getFlagEncoder
  • replace AbstractFlagEncoder and FlagEncoder with something less ugly, separate turn and node flags (Profile == collection of EncodedValue or ComposedEncodedValues)
  • maybe rename all XY08 into XYOld or XY09 instead? Add a deprecation notice and removal with 1.x
  • remove all the TODOs and ugly hacks
  • copyProperties can be done via array copy and setting just the node indices afterwards
  • define TagParser order
  • OSMReader is currently doing duplicate work to calculate bits of one OSM way, instead it should calculate the bits once and just array copy to the other edges using edge.get&setData?
  • InstructionAnnotation getAnnotation -> remove this or replace by a ComposedEncodedValue that can warn about private, tool, ferry or ford access?
  • separate TurnWeighting handling -> how to do storage of OSM-node and OSM-relation information?
  • merge new EncodingManager with old
  • try a real world example and parse OSM with a car profile and query via the new weighting mechanism
  • use IntRefs instead of int flags
  • a directed DoubleProperty and BooleanProperty class that holds two values (one value for a direction) that can be easily 'reversed' -> ComposedEncodedValue holds multiple encoded values? support get and getReversed directly for every encodedValue?
  • should we introduce a PropertyFilter so that a maxspeed is not parsed and calculated if the highway tag is not accepted ('rail')?
  • move out EncodedValue.parse into something else to make support of non-OSM easier

Later:

  • allow cross-integer space for EncodedValue otherwise we can get bad constellations e.g. if EncodedValue has two directions and requires e.g. 17 bits
  • we could introduce a "reverse" EdgeExplorer or iter.setBaseNode(node, reverse=true) and then avoid the reverse parameter in Weighting.calcWeight as well as the if clauses to fetch e.g. the forward or the reverse speed. I.e. one edge is directed API-wise and undirected storage-wise. We should do this undirected vs. directed stuff in a later step, see also (WIP) EncodingManager refactoring #1112 (comment)
  • store distance as an EncodedValue
  • But in general we need length dependent mechanism somehow
  • reimplement the CarFlagEncoder as a ComposedEncodedValue?
  • PathExtract should be more separated (call extract explicitly after calcPath and potentially rename calcPath to calcShortestPathTree)
  • solve problem with two bits for access restriction versus speed==0; what to do if a reverse read-out of a oneway edge: exception vs. silently ok? how would we do the exception - via a new setting "read of default value results in exception" or somehow associating the access-EncodedValue with e.g. the max_speed-EncValue?

…actEncodedValue, added MappedDecimalEncodedValue, separated parsing from EncodedValue, introduced TagParserFactory
return encoder.isBool(edge.getFlags(), FlagEncoder.K_ROUNDABOUT);
}
};
} catch (Exception ex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling with this a bit. Why do we need the try/catch, especially with a generic exception? Has it something to do if there is a encoder stored in the weighting?

Wouldn't it be better to be able to read the state without creating an exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not care about it this much. It will&must disappear. Just wanted to make this backward compatible and working

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok :)

@@ -40,7 +40,7 @@ public BBox getBox() {
* to specify an explicit bounding box. TODO: support GeoJSON instead.
*/
public static final LandmarkSuggestion readLandmarks(String file, LocationIndex locationIndex) throws IOException {
// landmarks should be suited for all vehicles
// landmarks should be suited for all profiles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about PT :)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean public transit? In its current form A* should work and so also landmarks but not sure if it works nor if it makes it faster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was just wondering as it's stating "suited" meaning, it's a good idea to combine PT with LM. I am not sure if this is true, so I was asking.

/**
* This class provides easy access to just one bit.
*/
public final class BitEncodedValue extends IntEncodedValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, GH missed a Boolean Encoded Value, not sure if we should call it like this instead of Bit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the difference here because boolean implies 1 byte and double implies 8 bytes. For Integer I just didn't find a better word yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I meant if we should call it BooleanEncodedValue instead of bit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A boolean usually uses 1byte (roughly) in the JVM, so I wanted to differentiate here as we really use just 1 bit, but not that important and we can change to Bool or Boolean

@boldtrn
Copy link
Member

boldtrn commented Jul 21, 2017

Wow, this change looks impressive! Thanks!

I haven't found the time to give this a full in-depth review yet, I only touched the tip of the iceberg so far. My main concern until now is the "ugliness" that you asked to ignore. I would prefer a way to get the state without provoking an exception.

I think using exceptions as a "if replacement" should be avoided if possible. If not then it is what it is :).

@karussell
Copy link
Member Author

karussell commented Jul 21, 2017

I think using exceptions as a "if replacement" should be avoided if possible.

Yes, of course. But if statements wouldn't make it more beautiful. We need a different solution, but I'll improve here only if I can solve the followings things before: 0. tuning everything a bit more 1. having a better API for some places (e.g. simple set&get using and 'reverse' stuff) and 2. trying to import a real data set to see if this is actually fast enough and we can model something more realistic.

this.extendedDataSize = Math.min(1, extendedDataSize / 4) * 4;
}

public EncodingManager add(TagParser parser) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be only allowed when not initialized?

return extendedDataSize;
}

// TODO should we add convenient getters like getStringProperty etc?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make it more convenient IMHO.

*
* @return the storable format that can be read via fromStorageFormatToInt
*/
public final int toStorageFormat(boolean reverse, int flags, int value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are flags int and not long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is an array of ints as the underlying RAMDataAccess is an array of ints and so this is the most efficient. But I'm thinking of using a byte array instead, which would be required if the GH node or way IDs exceed the 2^32 integer boundaries. The best would be if we could encapsulate this a bit with something like the IntRefs class and pass this in this method, which would also avoid the need of the public getOffset method.

return uncheckToStorageFormat(reverse, flags, value);
}

final int uncheckToStorageFormat(boolean reverse, int flags, int value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about unchecked?

private final double precision;

/**
* TODO should we really use precision here or use something like the already used 'factor'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, precision is the commonly used word for Double, so this does not sound wrong to me.

toStorageMap = new IntIntHashMap(values.size());

int index = 0;
for (double val : values) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it would change a lot, but should we sort the list of Doubles first? This would also allow us to accept Collections in general. On the other hand the current List approach might be a bit more robust against adding new elements at the end if no additional bit is required (not sure if this is actually a use case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah BTW, I think we should also check if any double occurs twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicates checking makes sense. Regarding the sorting: using Collection instead of a List is already possible now.

super(name, (int) Long.highestOneBit(values.size()));

// we want to use binarySearch so we need to sort the list
// TODO should we simply use a separate Map<String, Int>?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think Map might be faster for larger collections and maybe even easier to use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet a map is only faster for relative big collections (>30 entries) ... we should benchmark to make this argument ;). "Easier to use" is not an argument here as we can hide it in this class IMO.

String getName();

// TODO Every tag parser has an EncodedValue associated but except for convenient usage in EncodingManager we currently do not need this method
EncodedValue getEncodedValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't hurt but makes the code easier 👍

import com.graphhopper.routing.util.AbstractFlagEncoder;
import com.graphhopper.util.EdgeIteratorState;

public class TagParserFactory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this factory a bit of an overkill? Wouldn't it be easier to put this into classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with 'put this into classes'? Creating a RoundaboutTagParser and a HighwayTagParser etc? (This would introduce a lot more code/overhead IMO :))

I'm not super happy with it now but for me this factory was just convenient to avoid creating the same parsers in tests again and again.

@boldtrn
Copy link
Member

boldtrn commented Jul 22, 2017

We need a different solution, but I'll improve here only if I can solve the followings things before: 0. tuning everything a bit more 1. having a better API for some places (e.g. simple set&get using and 'reverse' stuff) and 2. trying to import a real data set to see if this is actually fast enough and we can model something more realistic.

Sounds good and makes sense. I just saw it when browsing through the code and thought I might add my two cents.

@karussell
Copy link
Member Author

I was able to fix a few things (regarding highway filter and oneway storage) so that full Germany is now properly imported (in 6min) and possible to route via car. The following config.properties is necessary

graph.flag_encoders=car
prepare.ch.weightings=no
prepare.min_network_size=0
prepare.min_one_way_network_size=0

and the following parameters are necessary and currently no bidirectional algorithm will work:

vehicle=weighting&weighting=fastest2&algorithm=astar

Such an uni-dir route request from south to nord takes ~6sec. Not too bad. Only a smaller fraction (~25%) of astar.runAlgo seems to be related to the new 'dynamic' FastestCarWeighting which could be probably further reduced via reusing the IntsRef for the next edges.

Somehow the instructions have no street name associated. Will investigate.

The applyWayTags is currently very lightweight and accounts to 0% of CPU time in the profiler.

@HendrikLeuschner
Copy link

Sure.
These are the different builders: Builders
They are employed via our custom ORSOSMReader which in turn calls the Processingcontext, which is a custom class, where the processWay function calls the builders of all enabled processors.
Currently we have changed the normal OSMReader so that it contains an empty method onProcessWay that is extended in our custom ORSOSMReader. We would choose a similar approach where we employ the calls in the Reader?

@karussell
Copy link
Member Author

I do not understand why a new layer (GraphProcessContext) and the changes to the OSMReader is necessary, also the naming is unspecific.

But creating separate classes that (if enabled) add multiple TagParsers to the EncodingManager sounds reasonable. Of course renaming TagParsers to something else is still possible IMO this responsibility sounds very similar to the old FlagEncoder concept except that there is now a need to avoid duplication (e.g. parsing&storing maxspeed is required for car and truck but should be stored only once). And this hierarchical aspect could be employed, so we end up in a class hierarchy instead of one big TagParserFactory class.

@HendrikLeuschner
Copy link

HendrikLeuschner commented Apr 4, 2018

I do not understand why a new layer (GraphProcessContext) and the changes to the OSMReader is necessary, also the naming is unspecific.

Sorry, I was not clear on this. This is how we do it right now. I will ditch both the Reader and the GraphProcessContext and implement everything in your existing classes + the new classes for separate TagParsers.

@karussell
Copy link
Member Author

Ah, no problem. Sounds great, thanks :) !

@HendrikLeuschner
Copy link

HendrikLeuschner commented Apr 10, 2018

I've rewritten the structure a bit. All TagParsers (though I have not added all yet) now reside in their respective own class: Tagparsers
The needed attributes (like surfaces list) are integrated to ensure good modularity.
The TagParserFactory now does not define profiles anymore but only creates single modules. The overview over enabled parsers and the profile definitions are now inside encodingmanager via a simple list of which parsers should be enabled. The Encodingmanager also checks whether a parser has already been added and has methods to add bikeParsers, footParsers and carParsers. I don't really like putting the profiles in here, I think they should be moved to a different class or configuration file. So far I also do not know how enabling the different profiles should be handled. I only found a usage of addGlobalEncodedValues.
So far I find this structure much easier to understand and modify, but I wanted to get your opinion(s) before I continue further.

@karussell
Copy link
Member Author

karussell commented Apr 10, 2018

Yes, this looks like a good way to go forward.

Now ỳou could remove most of the methods in TagParserFactory or maybe the whole class?

The Encodingmanager also checks whether a parser has already been added and has methods to add bikeParsers, footParsers and carParsers

Is such a check necessary if we have the duplicate check for the EncodedValue objects already?

I don't really like putting the profiles in here, I think they should be moved to a different class or configuration file.

Yes, think so too. But let's fix more important stuff first. These lists should be made private or are the needed outside of it?

BTW: in ~2 weeks I'll finally be able to increase priority and push this further faster :)

@HendrikLeuschner
Copy link

HendrikLeuschner commented Apr 11, 2018

Now ỳou could remove most of the methods in TagParserFactory or maybe the whole class?

Yes, this is now mostly done. Entire thing is of course still a work in progress.

Is such a check necessary if we have the duplicate check for the EncodedValue objects already?

No, you are right.

I fixed some things and added most of the remaining parsers, except for a few where I was not yet sure how to handle it (namely SpatialRuleId). Cleaned up some of the files too. I will continue to work on this.

@HendrikLeuschner
Copy link

Hi @karussell , are there in principal different encoded values for all profiles, meaning e.g. bike and bike2 have different encodedValues for average_speed and access etc? Including different tag parsers?
Running into this problem while running the tests. One of them fails while trying to find the ev bike2.access for the Bike2WeightFlagEncoder.
If they are shared, there is a need to distinguish the parsers too, possibly without duplicating them.

@karussell
Copy link
Member Author

If they are shared, there is a need to distinguish the parsers too, possibly without duplicating them

To make it backward compatible average_speed and access values cannot be shared, yes. E.g. bike2 modifies the speed due to elevation and racing bike is faster, but also access is different as mtb can use paths with a more difficult grade for example. Although I can imagine that making the access shared across bikes should be possible as one can use the Weighting to just avoid these paths (but the space requirement is very low and so this optimization should be done later)

In the future one could further try to calculate average_speed dynamically from other generic values but especially for non-CH use cases a precomputed value for both is faster while query.

@HendrikLeuschner
Copy link

I have not yet created all EncodedValues for all profiles, e.g. MotorcycleFlagEncoder. Therefore I have more failing tests than before (Without my changes its 142, with them its 216). I don't know when or if I will have the time to do add every remaining profile. Would you like me to stage a pull request anyway? Sorry for all the questions.

@karussell
Copy link
Member Author

Therefore I have more failing tests than before (Without my changes its 142, with them its 216).

I'm not sure if I understand this but a PR will make it clear probably.

I don't know when or if I will have the time to do add every remaining profile.

Sure

Would you like me to stage a pull request anyway?

Yes, please. You can add [WIP]

Sorry for all the questions.

No need for that really :)

@HendrikLeuschner
Copy link

Hi,

any update on this so far?
Best, Hendrik

@karussell
Copy link
Member Author

karussell commented May 15, 2018

I'm currently in the process of defining what should go in this PR and what not.

Here are the main goals of this PR:

  • edge storage should allow more than 64bits per edge
  • easier storing of edge properties -> EncodedValue methods
  • EncodingManager will be EncodedValues based
  • avoid reverseFlags

non-goals:

  • remove FlagEncoders
  • fix PathExtract
  • flexible storage for relation and node
  • solve problem with two bits for access restriction versus speed==0

And so I will remove the AccessValue refactoring and create a separate PR, see this separate branch: https://github.com/graphhopper/graphhopper/tree/access_refactoring

Also I'll remove the DefaultEdgeFilter refactoring as it is now in master.

Furthermore the json refactoring won't happen in this PR: #1291 (comment) and we'll use a string based EncodedValue properties storage somehow.

Now I'll try to merge your PR and rebase the changes against the access_refactoring branch. As a first step after this bigger change I'll try to fix the performance problem (currently we are 2x slower) and in parallel we could work on making more tests green.

@HendrikLeuschner
Copy link

Currently osm tags are only processed before edges are created via parse(IntsRef ints, ReaderWay way). For some of our information we need a parser that is called after edge creation, such as parse(EdgeIteratorState edge, ReaderWay way).
Do you think we could include something like this in the osm reader after edge creation?

On a different node, the optimal case for us would be to have a possibility of defining TagParsers on the fly even from outside the graphhopper repo. This should be some sort of method like TagParserFactory.add(TagParser p) or equivalent in EncodingManager, so that we can maintain our own parsers. I think this would be optimal for other projects built on graphhopper too.

@karussell
Copy link
Member Author

Do you think we could include something like this in the osm reader after edge creation?

What would be a use case? Why couldn't you do a post processing after the graph is created?

On a different node, the optimal case for us would be to have a possibility of defining TagParsers on the fly even from outside the graphhopper repo.

This should be already possible, but instead of TagParserFactory you would add it to the EncodingManager (?)

@karussell
Copy link
Member Author

karussell commented Aug 9, 2018

This PR is too big. I'll split it into separate ones:

  • access_refactoring Access refactoring #1436
  • move from long to IntsRef and avoid that performance will suffer, maybe fetch ints for the whole edge at once (intsref_refactoring branch)
  • EncodingManager will be EncodedValues based and we avoid calling reverseFlags for every edge access

@karussell
Copy link
Member Author

Closing in favor of #1447

@karussell karussell closed this Aug 28, 2018
@karussell karussell modified the milestone: 0.12 Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactoring to reduce set/getFlags usage
5 participants